HPS: a hierarchical Persian stemming method
نویسندگان
چکیده
In this paper, a novel hierarchical Persian Stemming approach based on the Part-Of-Speech (POS) of the word in a sentence is presented. The implemented stemmer includes hash tables and several deterministic finite automata (DFA) in its different levels of hierarchy for removing the prefixes and suffixes of the words. We had two intentions in using hash tables in our method. The first one is that the DFA don’t support some special words, so hash table can partly solve the addressed problem. And the second goal is to speed up the implemented stemmer with omitting the time that DFA need. Because of the hierarchical organization, this method is fast and flexible enough. Our experiments on test sets from Hamshahri Collection and Security News from ICTna.ir Site show that our method has the average accuracy of 95.37% which is even improved in using the method on a test set with common topics.
منابع مشابه
A new hybrid stemming algorithm for Persian
Stemming has been an influential part in Information retrieval and search engines. There have been tremendous endeavours in making stemmer that are both efficient and accurate. Stemmers can have three method in stemming, Dictionary based stemmer, statistical-based stemmers, and rulebased stemmers. This paper aims at building a hybrid stemmer that uses both Dictionary based method and rule-based...
متن کاملA Bottom Up approach to Persian Stemming
Stemmers have many applications in natural language processing and some fields such as information retrieval. Many algorithms have been proposed for stemming. In this paper, we propose a new algorithm for Persian language. Our algorithm is a bottom up algorithm that is capable to reorganize without changing the implementation. Our experiments show that the proposed algorithm has a suitable resu...
متن کاملA New Method for Stemming in Persian Language Considering Exceptions
In this paper a new algorithm for stemming in Farsi language is presented. This stemmer is based on removing the suffixes and prefixes but a database is used to save the exceptions to decrease error rate. In the proposed method the speed of stemmer and also the percentage of errors are improved. The evaluation results on a small Farsi document collection show significant improvement in precisio...
متن کاملAd Hoc Retrieval with the Persian Language
This paper describes our participation to the Persian ad hoc search during the CLEF 2009 evaluation campaign. In this task, we suggest using a light suffix-stripping algorithm for the Farsi (or Persian) language. The evaluations based on different probabilistic models demonstrated that our stemming approach performs better than a stemmer removing only the plural suffixes, or statistically bette...
متن کاملEvaluation of Perstem: A Simple and Efficient Stemming Algorithm for Persian
Persian is a challenging language in the field of NLP. Rightto-left orthography, complex morphology, complicated grammatical rules, and different forms of letters make it an interesting language for NLP research. In this paper we measure the effectiveness of a simple and efficient stemming algorithm, Perstem, on Persian information retrieval. Our experiments on the Hamshahri corpus at CLEF2009 ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1403.2837 شماره
صفحات -
تاریخ انتشار 2014